GCP Cheatsheet

1. Cloud Computing Fundamentals

1.1. What is cloud computing?

Concept	Definition	Key Points / Explanation
Cloud Computing (NIST Definition)	The delivery of a shared pool of on-demand computing resources (e.g., servers, storage, networking, databases) over the public internet, provisioned with minimal effort or provider interaction.	Eliminates the need for significant upfront infrastructure. Pay-per-use model. Enables faster provisioning of resources for scalability.
On-Demand Self-Service	Ability to provision computing resources automatically, without direct human interaction from the service provider.	Spin up or tear down resources (e.g., virtual machines, storage) at any time. No need to contact support just to create more instances.
Broad Network Access	Services are available over the network and accessible through standard mechanisms (e.g., web browsers, mobile, APIs).	Access from various devices (phones, tablets, laptops, etc.). Ensures availability of resources from anywhere.
Resource Pooling	Provider’s computing resources serve multiple customers via a multi-tenant model, while ensuring data privacy and security for each.	Similar to an apartment building sharing electricity and water but still maintaining private apartments. Customers do not control exact physical resource locations but can often choose a region or zone at a higher abstraction level.
Rapid Elasticity	The ability to scale resources up or down—often automatically—to match demand.	A core benefit of cloud computing. Appears effectively “unlimited” to the consumer, provisioning at any scale, at any time.
Measured Service	Resource usage is monitored, measured, and reported, enabling a pay-for-use model.	Pay only for what you consume (instances, storage, bandwidth). Transparent usage reporting for both provider and consumer.

1.2. Cloud Deployment Models

Concept	Definition	Key Points / Explanation
Public Cloud	Cloud services offered by third-party providers over the public internet.	Examples: Google Cloud (GCP), AWS, Azure. Infrastructure is fully owned and managed by the provider.
Multi-Cloud	Using multiple public clouds together (e.g., GCP + AWS + Azure) as part of one environment or strategy.	Useful for disaster recovery (DR) and mitigating vendor lock-in. Often more complex since you lose the benefits of unique, proprietary features from each cloud.
Private Cloud	An on-premises cloud dedicated to a single organization, still meeting the five essential cloud characteristics.	Implementations include GCP Anthos, AWS Outposts, Azure Stack. Not just regular on-prem with virtualization; must fulfill NIST cloud characteristics (on-demand, elasticity, measured service, etc.).
Hybrid Cloud	A single “cloud-like” environment that combines private cloud and public cloud.	Must provide consistent tooling and interface between private and public. Important: Merely connecting on-prem to a public cloud is called a “hybrid environment,” not necessarily a true “hybrid cloud” unless it meets cloud characteristics on-prem.

1.3. Cloud Service Models (XaaS)

Service Model	Definition	Key Points / Explanation
IaaS (Infrastructure as a Service)	The vendor abstracts and manages the underlying data center, networking, servers, storage, and virtualization layers. You manage OS, runtime, apps, data.	Example: Compute Engine on GCP (you handle OS patching, app deployment). Unit of consumption: Operating System (you control and maintain it). Most flexibility next to on-prem, but more management overhead than PaaS or SaaS.
PaaS (Platform as a Service)	The vendor abstracts data center, networking, servers, storage, virtualization, and the runtime/operating system. You manage the application and data.	Example: App Engine on GCP (you only focus on your code and data). Unit of consumption: Runtime (the environment you deploy code to). Simplifies deployment for developers; no need to worry about OS or scaling servers.
SaaS (Software as a Service)	The vendor handles everything, delivering the software to you as a web or API-based service.	Example: Google Workspace / G Suite, Microsoft Office 365. Unit of consumption: The Application (you just use it). Least management overhead, but least flexibility.
XaaS (Anything as a Service)	An umbrella term for any service delivered over the cloud (e.g., FaaS, CaaS, DBaaS).	GCP also includes services like Functions (FaaS) or Cloud Run (CaaS). Continuing expansions of cloud-managed solutions.

2. Google Cloud Fundamentals

2.1. Google Cloud Global Infrastructure

Concept	Definition	Key Points / Explanation
Private Global Network	Google’s private high-bandwidth, low-latency network that interconnects its data centers worldwide.	- Traffic typically remains on Google’s private backbone, ensuring high performance and security. - Includes extensive fiber, points of presence (PoPs), and subsea cables across continents.
Regions	Independent geographic areas that contain multiple zones.	- Each region can have several zones (usually 3+). - Inter-zone latency within a region is typically under 5 ms. - Deploying services across multiple zones in a region improves fault tolerance.
Zones	The smallest deployment entity within a region, acting as an isolated failure domain.	- Resources (like Compute Engine instances) live in a specific zone. - Redundant design: if one zone goes down, others remain unaffected.
Multi-Regions	Large geographic areas containing multiple regions.	- Used for maximum redundancy, distribution, or availability. - Data is stored and replicated across multiple regions within the multi-region.
Points of Presence (PoP)	Edge locations or network entry points where traffic enters/exits Google’s backbone.	- Optimizes latency by routing requests to the nearest PoP. - Also known as Google’s “edge network.”
Subsea Cables	High-capacity undersea fiber cables connecting continents.	- Google invests heavily in private subsea cables. - Enables fast, low-latency connectivity between major geographic areas.

2.2. Compute Service Options

Service	Service Model	Definition	Key Points / Explanation
Compute Engine	IaaS (Infrastructure as a Service)	Virtual machines (VMs) running in Google’s data centers.	- Complete control over OS, software, libraries. - You manage patching, scaling (auto-scaling possible with instance groups). - Supports custom or public images, plus marketplace solutions.
Google Kubernetes Engine (GKE)	CaaS (Container as a Service)	Managed Kubernetes environment for container orchestration.	- Runs on top of Compute Engine instances as worker nodes. - Automates container deployment, scaling, and management. - Based on open-source Kubernetes, so it’s portable between on-prem/other clouds if also running Kubernetes.
App Engine	PaaS (Platform as a Service)	Fully managed platform for building and hosting web apps.	- Auto-scales based on traffic. - Abstracts OS management, security updates, runtime patching. - Supports common languages (Java, Python, Go, Node.js, etc.) and custom runtimes.
Cloud Functions	FaaS (Function as a Service)	Serverless environment to run short-lived functions triggered by events.	- No server management; pay only for execution time. - Integrates well with other GCP services (e.g., Cloud Storage triggers, Pub/Sub, etc.). - Good for lightweight, event-driven microservices, data processing, and real-time event handling.
Cloud Run	Serverless (also considered FaaS/CaaS)	Fully managed compute for containerized apps, built on KNative.	- Deploy any container with your choice of language, runtime, or libraries. - Scales to zero when no traffic; scales up instantly on demand. - Often described as “serverless containers.”

2.3 Storage & Database Options

2.3.1 Storage Services

Service	Type	Definition	Key Points / Explanation
Cloud Storage	Object Storage	Scalable, durable, highly available object storage (documents, images, backups, etc.).	- 11 “nines” of durability (99.999999999%). - Multiple storage classes (Standard, Nearline, Coldline, Archive) for cost optimization based on access frequency. - Availability options: Regional, Dual-Regional, or Multi-Regional.
Filestore	File Storage	Fully managed NFS (Network File System) for sharing files across multiple Compute Engine VMs or GKE.	- NFS v3 compliant. - Useful when multiple VMs/containers need concurrent read/write access to the same shared file system.
Persistent Disk	Block Storage	Durable block storage volumes for Compute Engine (VM) instances.	- Attached to a single VM for OS/data disk use. - Available as HDD (Standard) or SSD for higher IOPS, lower latency. - Zonal or regional replication options. Disk Types in GCP • Persistent Disks: They’re like a safe box that keeps your data even when your VM is off, available as slower standard disks or faster SSDs. • Local SSDs: These are super-fast disks attached right to your VM, but they lose their data when the VM stops. • Boot Disks: A kind of persistent disk that holds the operating system so your VM can start and run.

2.3.2 Database Services

Service	Type	Definition	Key Points / Explanation
Cloud SQL	Relational (SQL)	Fully managed SQL database (MySQL, PostgreSQL, SQL Server).	- Automated backups, replication, patching, scaling. - Zonal high-availability; can set up cross-region read replicas.
Cloud Spanner	Relational (SQL)	Horizontally scalable, strongly consistent, globally distributed relational database.	- Handles high transaction volume with strong consistency. - Multi-region or even global replication. - Used for mission-critical apps needing global scale and ACID transactions.
Bigtable	NoSQL (Wide-Column)	Fully managed, high-throughput, low-latency NoSQL database (inspired by Google’s internal Bigtable system).	- Good for large analytic or operational workloads (e.g., IoT, time-series data). - Scales to petabytes. - Cluster resizing without downtime.
Datastore / Firestore	NoSQL (Document)	Schemaless or document-style databases often used for web/mobile/IoT apps.	- Datastore: multi-region replication, ACID transactions. - Firestore: near-real-time updates, designed for offline use, easy integration with Firebase for mobile. - Both scale automatically and can handle millions of reads/writes.
Memorystore	In-Memory Cache	Fully managed Redis or Memcached in-memory data store.	- Low latency caching layer for frequently accessed data. - Helps scale read performance in high-traffic scenarios. Memorystore/Redis is a managed in-memory cache that multiple applications can share for rapid data access, whereas local SSDs are high-speed, VM-attached disks used for fast, temporary storage that lose data when the VM stops.

2.4 Networking Services

Concept / Service	Definition	Key Points / Explanation
VPC (Virtual Private Cloud)	A virtualized global network that manages networking for GCP resources.	- Acts like your “virtual data center.” - Global scope spans all GCP regions. - You can create subnets per region, control IP ranges, and segment networks. - Each project has a default VPC; you can create additional VPCs if needed.
Firewall Rules	Control inbound/outbound traffic at the instance level, globally distributed.	- Defaults exist (allow internal traffic, SSH, etc.), and you can define custom rules. - Stateful firewall; traffic is allowed or denied based on rules.
Routes	Specify how traffic leaves an instance and gets routed to other destinations.	- Default route to the internet, plus additional routes for custom routing scenarios. - Work with firewall rules to manage traffic flow.
Load Balancing	Distributes inbound traffic across multiple backends/instances to handle workloads efficiently.	- HTTP/HTTPS Load Balancing: Global, Layer 7 load balancing with content-based routing. - Network Load Balancing: Regional, Layer 4 load balancing for TCP/UDP traffic.
Cloud DNS	Google’s managed DNS service, using the same infrastructure as Google’s own DNS.	- Create/maintain DNS records (A, AAAA, MX, CNAME, TXT, etc.) in managed zones. - Low-latency, high-availability DNS resolution.
Cloud VPN	Secure IPsec connection between on-premises network and GCP VPC over the public internet.	- Encrypted traffic using VPN tunnels. - Ideal for lower-volume or basic hybrid scenarios.
Direct Interconnect	Dedicated high-speed, private connection from on-premises data center to GCP, bypassing the public internet.	- Provides low latency, high availability. - Suited for large data transfers, stable throughput needs.
Peering (Direct/Carrier)	Connects your network to Google’s edge through a peering exchange or via a carrier partner.	- Direct peering: exchange traffic directly with Google at a peering facility. - Carrier peering: traffic flows to GCP through a partner’s network.

3. GCP Account Setup & Management

3.1 Resource Hierarchy

Concept	Definition / Explanation
Resource	Any entity you use in GCP, e.g., Compute Engine VMs, Cloud Storage buckets, Cloud SQL instances, plus higher-level “account” resources (projects, folders, organization).
Resource Hierarchy	A structure (organization → folders → projects → service-level resources) for organizing and managing cloud resources.
Parent-Child Relationship	Policies/permissions set at a parent resource are inherited by its children. Each child has exactly one parent.
Organization Node	The root of the hierarchy, associated with one G Suite/Cloud Identity domain. IAM policies set at this level apply across all folders/projects/resources.
Folders	An optional grouping mechanism between organization and projects (e.g., by department or environment). Must have an organization node to use folders. Each folder can contain multiple child folders or projects, but any folder/project has exactly one parent.
Projects	The required base-level grouping entity for using GCP services; all service resources belong to a single project. A project can only belong to one folder or the organization if no folders are used.
Service-Level Resources	Actual resources you create (VMs, buckets, databases, etc.). They sit at the bottom of the hierarchy, inside a project.
Labels	Key-value pairs that help organize and filter resources, especially for cost tracking.
IAM Policy Inheritance	Setting an IAM policy at one level automatically applies that policy to child objects. E.g., a role assigned at the folder level flows down to all projects/resources under that folder.

3.2 Creating a Free Tier Account

Step / Concept	Definition / Explanation
Free Tier	- 12-month free trial. - $300 USD credit (or local currency equivalent) to explore GCP. - Personal (individual) account only, not a business account. - Ends when credits are used or 12 months pass.
Always Free	- Set of services/resources GCP provides at no cost indefinitely (within usage limits). - Available on any upgraded GCP account (i.e., after you’ve added billing details). - If usage goes beyond the free limits, standard billing applies.
Requirements	- A new Gmail address (to avoid conflicts). - A valid credit/debit card (for verification). - (Optional) Use Incognito/Private Browser session to avoid accidental account collisions.
Steps to Create	1. Go to the Free Trial URL (cloud.google.com/free). 2. Create a new Google/Gmail account if needed. 3. Provide credit card info for identity verification (no charges unless you exceed credits). 4. Accept terms to start the free trial.
Verification & Setup	- Google may send you a phone verification code. - Once complete, you see your free trial credit displayed in the GCP Console’s billing section.

3.3 Securing Your Account (Two-Step Verification)

Concept	Definition / Explanation
Two-Step Verification	An extra layer of account security requiring something you know (your password) and something you have (phone/security key). If your password is compromised, attackers still need physical access to the second factor.
Verification Methods	- Text/Voice Call: Google sends a code via SMS or phone. - Authenticator App: Generates one-time codes (e.g., Google Authenticator). - Google Prompt: Approve sign-ins via push notification. - Security Key: Physical USB/NFC key for strong protection against phishing.
Backup Codes	One-time codes you can download/print in case your phone/security key is unavailable.
Best Practices	- Always enable multi-factor auth (MFA) for all admin and personal GCP accounts. - Use push notifications or security keys if possible (fewer SIM-swap risks).

3.4 GCP Console Overview

Feature	Definition / Explanation
Console Home Page	- Displays summary “cards” about recent activity, project info, billing, etc. - You can customize which cards appear.
Navigation Menu	- “Hamburger” menu in the top-left corner gives access to all GCP services grouped by category (Compute, Networking, etc.). - You can “pin” frequently used services to the top for quick access.
Search Bar	- Quickly locate services, APIs, or even specific resources by name.
Project Selector	- Choose your active project in the top menu. Project-level resources (VMs, Cloud Functions, etc.) are accessed only under the currently selected project.
Console Top Bar	- Activate Cloud Shell icon for terminal access to GCP resources. - Notifications (bell icon) for events/logs. - Help (question mark icon) for docs/keyboard shortcuts.
Activity & Recommendations	- Activity tab shows recent actions (creating resources, changing IAM, etc.). - Recommendations shows cost or performance suggestions from GCP’s Recommender service (e.g., right-sizing VMs).

3.5 Cloud Billing

Concept	Definition / Explanation
Cloud Billing Account	Tracks costs for GCP usage. Linked to at least one payment method (credit card or bank account). Can pay for multiple projects.
Payments Profile	A Google-level resource storing payment methods, billing contacts, and legal information. Used across Google services (not just GCP).
Billing Account Types	- Self-Service (Online): Credit/debit card auto-charged, invoices visible online. - Invoiced (Offline): Must qualify for invoice billing; pay via check/wire transfer, and receive monthly invoices by mail or electronically.
Sub-Accounts	Used by resellers to group charges (e.g., multiple customers) on a separate section of the invoice. Linked to a master billing account.
Ownership & Linking	- A single organization owns a billing account (though it can pay for projects in different orgs). - A project without an attached billing account can only use GCP’s free services (limited usage).
Roles & Permissions	Billing access is controlled by IAM roles, e.g., Billing Account Administrator, Billing Account Creator, Billing Account User, etc.
Creating/Editing/Closing	- You can create new billing accounts (must have Billing Account Creator role). - Link/unlink projects (Project Billing Manager + appropriate billing roles). - Close billing accounts after detaching all projects.

3.6 Controlling Costs & Budget Alerts

Concept	Definition / Explanation
Committed Use Discounts	You commit to a specific level of resources/usage for 1-3 years, in exchange for reduced hourly rates on those resources.
Resource-Based Commitments	Commit to a certain amount of vCPUs, memory, GPUs, etc. in a particular region for Compute Engine. Discounts up to 57% for most machine types (up to 70% for memory-optimized).
Spend-Based Commitments	Commit to a spending level ($ per hour) for specific services, such as Cloud SQL or VMware Engine. Discounts up to ~25-52% depending on 1-year or 3-year.
Sustained Use Discounts	Automatic discounts for Compute Engine when you run resources (general purpose/memory-optimized VMs) for a substantial portion of the month (25%+). Scales up to 30% max discount.
GCP Pricing Calculator	Web tool to estimate monthly costs for a planned architecture. Helps forecast spend in advance (cloud.google.com/products/calculator/).
Budgets & Budget Alerts	- Define a budget amount for a billing account or specific projects. - Set thresholds to trigger email alerts (50%, 90%, 100%, etc.) of your budget. - By default, emails go to Billing Admins/Users. You can configure additional recipients via Cloud Monitoring or integrate with Pub/Sub for automated responses.
Pub/Sub Integration	Programmatic notifications when budgets exceed thresholds. Example automations: shut down expensive resources, push Slack alerts, or freeze new deployments.
Reservations	Reserve (and guarantee availability of) certain VM resources (e.g., a certain number of cores/CPUs in a region). Pairs well with committed use discounts for consistent, predictable workloads.

3.7 Billing Export

Concept	Definition / Explanation
Billing Export to BigQuery	Automatically export granular billing data (usage cost details, pricing) from GCP to a BigQuery dataset.
Daily Cost Detail	Exports daily usage and costs at a detailed level.
Pricing Data	Optionally exports GCP’s list pricing information to BigQuery.
Use Cases	- Analyze spend in BigQuery or visualize via tools like Looker Studio (Data Studio). - Helps with cost optimization, trend analysis, and custom dashboards.
Important Note	Billing export is not retroactive; data is collected only after you enable this feature.
Setup Steps	1. Create or choose a BigQuery dataset. 2. Enable billing export in GCP Console (link dataset). 3. Enable the BigQuery Data Transfer Service API. 4. Data is updated daily (for cost detail).

3.8 Cloud APIs

Concept	Definition / Explanation
Cloud APIs	The set of GCP service endpoints that let you programmatically control and integrate GCP resources (e.g. Compute Engine API, BigQuery API, etc.).
Enable an API	You must enable each API at the project level before you can use it (via Console, gcloud CLI, or Service Usage API).
API Library	Console section listing available GCP APIs. Allows quick enabling/disabling.
Monitoring / Quotas	API usage can be tracked in the API Dashboard (IAM & Admin → Quotas). Quotas help prevent excessive usage.
Automation	Accessing Cloud APIs directly allows you to script or code solutions in your preferred language. gcloud/Console also use these APIs under the hood.

3.9 Adding an Admin User

Concept	Definition / Explanation
Super Admin Account	- Exists in G Suite / Cloud Identity setups. - Has irrevocable org-level permissions (can grant Organization Admin, etc.). - Should not be used for daily tasks (principle of least privilege).
Personal Gmail Approach	- If you do not have G Suite / Cloud Identity, you may use personal Gmail accounts as separate “admin” or “user” accounts. - Each standalone Gmail account can have distinct billing & IAM roles.
Billing Account Admin vs. User	- Billing Account Admin: Full control over billing (budgets, payment methods, etc.). - Billing Account User: Can link/unlink projects to the billing account but cannot modify payment methods or budgets.
Principle of Least Privilege	Assign only the minimum required roles (e.g., a second user might only need Billing Account User if they just need to attach projects to billing).
Steps to Add	1. In Console, go to Billing → Account Management. 2. Add the new user’s email, select the appropriate role (e.g., Billing Account User). 3. New user can log in, create projects, attach to that billing account, etc.

3.10 Cloud SDK and CLI (Overview)

Concept	Definition / Explanation
Cloud SDK	A set of command-line tools (primarily `gcloud`, `gsutil`, `bq`) for managing GCP resources.
gcloud CLI	Main CLI tool for GCP. Allows you to create, update, delete resources (VMs, networks, etc.), manage IAM, billing, etc.
User vs. Service Account Auth	- User Account: Tied to an individual’s Google identity. Good for interactive use on a single machine. - Service Account: Tied to a service identity. Often used for automation (scripts, CI/CD).
Key Commands	- `gcloud init`: Initialize & authorize the SDK; creates a configuration. - `gcloud auth login`: Authorize using user credentials. - `gcloud config`: Manage configurations (set account, project, zone, etc.). - `gcloud components`: Install/update additional CLI components.
Command Format	`gcloud [component] [entity] [operation] [arguments] [flags]` (e.g., `gcloud compute instances create …`).

3.11 Managing Cloud SDK

Concept	Definition / Explanation
Multiple Configurations	You can create named “profiles” of gcloud settings (e.g., `default`, `master`) to handle different accounts/projects. Switch with `gcloud config configurations activate <NAME>`.
Auth & Accounts	- `gcloud auth list`: List all authorized accounts and shows which one is active. - `gcloud auth revoke <ACCOUNT>`: Remove credentials for an account.
Set Config Properties	`gcloud config set <property> <value>` (e.g., `gcloud config set project my-project`). Applies to the currently active configuration.
Components	- `gcloud components install <component>`: Install optional tools (e.g., `kubectl`). - `gcloud components list`: See available components. - `gcloud components update`: Update all installed components to the latest version.
Interactive Shell (beta)	`gcloud beta interactive` provides inline autocompletion, hints, and command documentation.
Info & Logs	`gcloud info`: Shows details about your SDK installation, project, active account, and config location.

3.12 Cloud Shell & Editor

Concept	Definition / Explanation
Cloud Shell	A browser-based, ephemeral VM with the Cloud SDK & other dev tools preinstalled (git, Docker, Kubernetes tools, etc.). Authenticated as your user automatically.
Persistent Home Directory (5 GB)	Each user gets 5 GB of persistent storage mounted to `/home/<user>`. Data remains intact across sessions but sessions themselves are ephemeral. After 1 hour of inactivity, the VM is reclaimed, but files in `/home` persist.
Auto-Upgrade	Cloud Shell’s SDK components are updated weekly.
Code Editor	Integrated via Eclipse Theia. Allows browsing, editing files in your Cloud Shell environment.
Web Preview	Lets you preview web apps running in Cloud Shell on a secure proxy (ports typically 8080 or 8081). Accessible only to your logged-in user.
Customization	- You can auto-install extra tools by creating `.customize_environment` in your home directory. - Boot script runs at session startup (e.g., to install Terraform, Helm, etc.).
Quota & Limits	- 50 hours/week usage limit. - If idle for 1 hour, session is terminated. - If you don’t use Cloud Shell for 120 days, your home disk is deleted (with warning).

3.13 Creating & Managing Projects

Concept	Definition / Explanation
Creating a Project	- Each project is a separate namespace for resources. - Must have a billing account linked (unless using only free-tier services).
Switching Projects	- GCP Console “Project Selector” or `gcloud config set project <PROJECT_ID>`. - Each project has a unique ID (automatically assigned or custom).
Linking to Billing	- Users need appropriate billing roles to link a project to a billing account (e.g., Billing Account User + Project Owner).
Permissions	- Projects can be shared with other Google accounts via IAM roles. - E.g., Project Editor, Project Owner, Project Viewer.
Managing Multiple Projects	- Best practice to isolate environments, e.g., dev/test/prod in separate projects. - Each project can have distinct roles, budgets, APIs enabled, etc.

3.14 Limits & Quotas

Concept	Definition / Explanation
Quotas	Resource usage limits for APIs/services at the project level (e.g., number of VMs, load balancers, requests per day).
Types of Quotas	- Rate Quotas: e.g., requests per second, per day (resets after time). - Allocation Quotas: e.g., max number of VMs or CPU cores (must manually free up by deleting resources).
Purpose	- Protect GCP users from accidental usage spikes. - Manage resource distribution among many customers. - Provide a limit you can request to raise if needed.
Viewing Quotas	1. Quotas Page (IAM & Admin → Quotas) for a full project-wide list. 2. API Dashboard (APIs & Services → select an API → Quotas) for per-API usage over time.
Requesting Increases	- Select the quota, click “Edit Quotas,” specify new limit, and submit to Google for approval. - Approval is often within ~2 business days.
Quota Monitoring	- Some services expose quota metrics in Cloud Monitoring (e.g., Compute Engine). - You can create custom dashboards and alerts for near-quota usage.
Errors	Hitting a quota limit may return HTTP 429 or `resourceExhausted` (for gRPC).

4. Identity and Access Management (IAM)

4.1 Core IAM Concepts

Concept	Definition / Explanation
Principle of Least Privilege	Grant only the minimum necessary permissions to users/services. Avoid broad roles (e.g., Owner, Editor) in favor of more granular ones.
IAM	“Identity and Access Management” in GCP. Manages who (member) has what (role/permission) on which resource.
Policy	Collection of statements/bindings specifying which members get which roles (and under what conditions). Attached to a resource (organization, folder, project, or resource).
Binding	Binds a role to one or more members, plus optional conditions.
Metadata (ETag, Version)	- Etag: Concurrency control token that changes each time the policy is updated. - Version: Specifies policy schema version (1, 3 are common). Version 3 supports conditions.
Audit Config	Specifies which permission types (Admin Read, Data Read, Data Write) get logged, and which identities are exempted.

4.2 Members (Identities)

Member Type	Definition / Explanation
Google Account	A user with a Google identity (e.g., gmail.com or a managed account in your domain).
Service Account	Special account for applications/VMs, not tied to a human. Used to authenticate workloads (e.g., GCE, GKE pods) to other GCP services.
Google Group	A named collection of accounts/service accounts. Granting roles to the group implicitly grants them to all members.
G Suite/Cloud Identity Domain	Represents all users under a specific domain (e.g., my-company.com). Can manage domain users centrally.
allAuthenticatedUsers	Anyone with a Google Account/Service Account authenticated with Google.
allUsers	Anyone on the internet (anonymous & authenticated). Highly risky—grants public access.

4.3 Roles & Permissions

Concept	Definition / Explanation
Permission	Action allowed on a service (e.g., `compute.instances.start`). Typically follows the pattern `service.resource.verb`.
Role	Named collection of permissions. You cannot grant permissions directly; you grant roles to members.
Primitive Roles	Owner, Editor, Viewer. Very broad. Apply at project level. Google recommends avoiding them except for small cases—prefer more granular roles.
Predefined Roles	Roles curated by Google for specific services. Provide fine-grained permissions. E.g., `compute.instanceAdmin.v1`, `storage.objectViewer`.
Custom Roles	User-defined roles bundling specific permissions. Not automatically updated by Google. Created at org or project level. Let you tailor exactly which permissions are included.
Launch Stage (Custom Roles)	Each custom role has a stage: alpha, beta, or GA. Mainly for internal lifecycle tracking.

4.4 Policy Inheritance & Conditions

Concept	Definition / Explanation
Hierarchy	Org → Folders → Projects → Resources. A resource inherits the union of policies from higher levels.
Effective Policy	Union of the resource’s own policy + all inherited policies from ancestors.
Policy Versions	- v1: Standard (no conditions). - v2: Internal to Google. - v3: Supports conditions.
Condition	A logic expression restricting the role binding to specific context (e.g. time-based, IP-based). If condition is false, no access is granted.
Time-Based Access	Example: Grant role only until a certain date/time, or only during specific hours.
Resource-Based Access	Example: Grant roles only for certain resource name patterns or regions.

4.5 Service Accounts

Concept	Definition / Explanation
Service Account	Non-human account for apps/VMs to access GCP APIs. Identified by unique email (e.g., `my-sa@my-project.iam.gserviceaccount.com`).
Types	- User Managed: You create & manage. - Default: Auto-created for GCE/App Engine with Editor role by default. - Google Managed: For internal Google services (service agents).
Authentication (Keys)	- Google-managed keys: Private portion never exposed, automatically rotated. - User-managed keys: You hold the private key. Must rotate & secure it yourself (risk of compromise).
Service Account Permissions	Service accounts can be granted roles (i.e., they’re an identity). Also, controlling who can “act as” (impersonate) a service account is crucial (via the “Service Account User” role).
Access Scopes	Legacy mechanism for granting permissions on default service accounts. Modern approach is to use IAM roles on the service account.
Best Practices (Service Accounts)	- Use a separate service account per application component. - Avoid using default service accounts in production—create custom ones with minimal roles. - Rotate external (user-managed) keys frequently. - Keep keys out of source code.

4.6 Cloud Identity

Concept	Definition / Explanation
Cloud Identity	Google’s Identity-as-a-Service solution. Centrally manages users/groups, enforces policies (2SV, password rules), SSO, device management, and more.
Device Management	Enforce security policies on users’ mobile or desktop devices, e.g. passcodes, wiping corporate data on departure.
Security Features	- 2-Step Verification: Mandate strong multi-factor authentication. - Password Policies: Centralized control of password complexity, rotation, etc.
Single Sign-On (SSO)	Users log in once with corporate credentials (Cloud Identity or G Suite) to access multiple apps. Supports SAML, OAuth, OpenID, AD FS, etc.
Reporting & Audits	Audit logs for user logins, group changes, device changes. Export to BigQuery for analysis.
Directory Management	Sync with on-prem or external identity providers (Active Directory, LDAP) using Google Cloud Directory Sync (GCDS).
Google Cloud Directory Sync	A tool that synchronizes user accounts, groups, and directory data from on-premises LDAP directories to Google Cloud services.
Identity federation	A set of protocols and practices that enable an external identity provider to authenticate users, allowing access to multiple systems using a single set of credentials.

4.7 IAM Best Practices

Best Practice	Explanation
Use Least Privilege	Grant only necessary permissions—prefer narrower roles (e.g., predefined) over broad roles (Owner/Editor).
Use Groups	Assign IAM roles to Google groups rather than individual users. Makes membership changes simpler without editing the policy.
Set Policies at the Appropriate Level	E.g., if you only need to grant roles for a single project, don’t do it at the organization or folder level.
Control Service Account Creation	Limit who can create or manage service accounts—because someone who can impersonate a high-privilege service account can access all resources that account has.
Rotate Keys	For any user-managed service account keys, rotate them periodically to prevent compromise.
Check Audit Logs	Monitor logs for suspicious policy changes and/or key usage. Export them to Cloud Storage or BigQuery for long-term retention.
Minimize Default SA Usage	Don’t rely on default service accounts (often have broad Editor role). Create custom SAs with narrower permissions.
Mirror Org Structure	Use folders/projects to match your organization’s departments/teams for logical separation and policy inheritance.

5. Networking Services

5.1 Networking Refresher (OSI Layers, IPv4, IPv6, etc.)

Concept / Layer	Definition / Explanation
OSI Model (7 Layers)	A conceptual model for how data moves through a network: Physical → Data Link → Network (Layer 3) → Transport (Layer 4) → Session → Presentation → Application (Layer 7).
IPv4 Addressing	32-bit address written as dotted decimals (e.g., `192.168.0.1`). Divided into network + host portions. RFC 1918 private ranges: • `10.0.0.0/8` • `172.16.0.0/12` • `192.168.0.0/16`
CIDR (Classless Inter-Domain Routing)	Replaces “classful” A/B/C approach with flexible prefix notation (e.g., `/16`). The larger the slash number, the smaller the network size, e.g. `/24` = 256 addresses; `/16` = 65,536 addresses.
IPv6 Addressing	128-bit hexadecimal notation (e.g., `2001:0db8:85a3::8a2e:0370:7334`). Can shorten zero blocks with `::`. Uses `/64` for many typical subnets.
Transport (Layer 4)	- TCP (Transmission Control Protocol) ensures reliable, ordered delivery. - UDP (User Datagram Protocol) is a simpler, connectionless protocol often used for DNS or streaming.
Application (Layer 7)	Protocols like HTTP(S), DNS, SSH, SMTP, etc. The highest layer where user-facing apps / services operate.

5.2 Virtual Private Cloud (VPC)

Concept	Definition / Explanation
VPC Overview	A global, software-defined network in GCP. Spans all regions. Contains subnets (regional). Allows internal communication over private IPs within the same VPC.
Global Resource	VPC itself is a global resource, but subnets are per-region.
Default Network	Created automatically in new projects (unless disabled by an org policy). It’s an auto mode VPC with one predefined subnet per region (using `10.128.0.0/9` block). Includes default firewall rules for SSH, RDP, ICMP, and internal traffic.
Auto Mode vs. Custom Mode	- Auto Mode: Automatically creates one subnet per region. Subnets are assigned from the `10.128.0.0/9` range. Can be converted to custom mode. - Custom Mode: No subnets by default; you manually create subnets & define IP ranges. Recommended for production.
VPC Peering / VPN	Separate VPCs typically can’t communicate via internal IPs unless you set up VPC peering or a VPN / Interconnect.
Default Firewall Rules	Default VPC includes rules allowing inbound SSH, RDP, ICMP from any source, and all protocols/ports inside the network. Modify or remove as needed for security.

5.3 VPC Network Subnets

Concept	Definition / Explanation
Subnets	Regional partitions of a VPC network’s IP space. Contain primary (and optionally secondary) IP ranges.
Primary vs. Secondary Range	- Primary: The main CIDR block used for VM instance IP assignments. - Secondary: (Optional) Additional CIDR blocks for scenarios like container alias IPs, etc.
Subnet Expansion	You can expand a subnet’s IP range without downtime, as long as it doesn’t overlap with existing subnets. Once expanded, it cannot be reverted to a smaller range.
Auto Mode Subnets	Created automatically for each region with default CIDR blocks. Each region’s block can be expanded (up to /16), or you can convert the entire VPC to Custom Mode for more control.
Reserved IPs	Each subnet’s primary range reserves 4 IP addresses (network, default gateway, future use, broadcast). Secondary ranges do not have reserved IPs.

5.4 Routing & Private Google Access

Concept	Definition / Explanation
Routes	Define how traffic exits a VM to reach a destination (either inside or outside the VPC). Each route has a destination range + next hop.
System-Generated Routes	- Default Route: `0.0.0.0/0` → Default Internet Gateway. Priority 1000. You can remove/replace it if you want full isolation. - Subnet Routes: One route per subnet’s primary and secondary range. Priority 0, more specific than default route. Cannot be removed separately.
Custom Routes	- Static: Manually created or set up with policy-based VPN. - Dynamic: Managed by Cloud Router (using BGP for Cloud VPN/Interconnect).
Routing Priority	Lower number = higher priority. For identical destination ranges, the route with the smallest priority value wins.
Private Google Access	VM instances without external IPs can still access Google APIs/services by enabling this on their subnet. Traffic to Google stays on Google’s backbone rather than going out to the public internet.
Use Cases for PGA	- Subnet without external IP addresses. - On-prem to GCP via VPN/Interconnect. - GCP serverless or VPC peering (private services access).

5.5 IP Addressing in GCP

Concept	Definition / Explanation
Internal vs. External IP	- Internal IP: Reachable only within the same VPC (private). - External IP: Reachable from the public internet (if firewall allows).
Ephemeral vs. Static	- Ephemeral: Auto-assigned, released when resource is stopped/deleted. - Static: Reserved and remains allocated to your project until released.
Internal IP Allocation	- Automatically assigned from subnet’s IP range. - You can specify an address or reserve one. - Alias IP ranges let you define multiple IPs on a VM (e.g., container pods).
External IP Allocation	- Ephemeral assigned if you launch a VM with external access (and you don’t specify a static one). - Can reserve a static external IP (regional or global). Regional → used by VMs / LBs in that region. Global → used by global LBs.
Promotion (Ephemeral → Static)	You can take an ephemeral IP (internal or external) in use by a resource and promote it to static so it won’t change.
Bringing Your Own IP (BYOIP)	You can import your own publicly routable IP prefixes (min `/24`) to GCP. Must prove ownership.

5.6 Creating Internal & External Static IP Addresses (Demo Highlights)

Action	Definition / Explanation
Reserve a Static Internal IP	1. On VM creation (Console → Networking → Reserve static internal IP). 2. Or create VM with ephemeral IP, then promote it to static. 3. Must be within the subnet’s CIDR range.
Reserve a Static External IP	1. Go to VPC Network → External IP addresses → Reserve static address. 2. Assign it to a VM or load balancer. 3. (Optional) Promote ephemeral external IP to static.
Promote Ephemeral → Static	Convert a currently-used IP to a persistent assignment (for internal or external addresses). Prevents IP changes on VM stop/start.
Deleting / Releasing	Remember to remove static IPs when no longer needed; otherwise, they incur charges even if unattached. - For internal IPs, use gcloud or re-assign in the networking settings. - For external IPs, “Release static address” in the console or via gcloud.
gcloud compute addresses	- `list`: View addresses in your project (internal & external). - `create`: Reserve a static IP. - `delete`: Release it.

5.7 VPC Firewall Rules

Concept	Definition / Explanation
Distributed Firewall	Each VPC has a distributed, stateful firewall at the VM level. Rules apply inbound (ingress) or outbound (egress).
Implied Rules	- Allow Egress: All outbound traffic is permitted unless blocked by a higher-priority rule. - Deny Ingress: All inbound traffic is denied unless allowed by a firewall rule.
Default Rules	In the default VPC, rules allow ICMP, RDP(3389), SSH(22) inbound from any source and all protocols within the network. Priority 65534.
Firewall Rule Components	- Direction: Ingress or Egress. - Action: Allow or Deny. - Targets: Which VMs (all, by tags, by service account). - Source/Dest: IP ranges, tags, or service accounts. - Protocols/Ports: e.g. tcp:22, icmp, etc. - Priority.
Stateful	Once a connection is allowed, the return traffic is automatically allowed (connection tracking).
Enable / Disable	You can disable a rule without removing it (handy for troubleshooting).

5.8 Custom VPC (Demo Highlights)

Action	Definition / Explanation
Create Custom VPC	1. VPC Network → Create VPC. 2. Choose Custom subnet mode (no automatic subnets). 3. Manually add subnets, specifying region + CIDR.
Add Public / Private Subnets	E.g., “public” subnet with external IP usage, “private” subnet with no external IP addresses.
Enable Private Google Access	Allows VMs with no external IP to still reach Google APIs/services (Cloud Storage, etc.) over internal IP. - Turned on at the subnet level.
Create Instances	- Public instance: ephemeral or static external IP, can reach internet. - Private instance: no external IP. Must rely on private Google Access or direct connection from the public instance to reach outside resources.
Firewall Rules	- E.g., allow SSH from 0.0.0.0/0 to public instances, allow internal traffic from public→private. - Use target tags (e.g., “public” / “private”) to limit scope.
Verification	- SSH into public instance from internet. - From public → private instance (SSH or ping). - Check Cloud Storage access from private instance via private Google Access (no external IP).

5.9 VPC Network Peering

Concept	Definition / Explanation
VPC Peering	Privately connect two VPC networks (in the same or different projects/orgs) so their internal IPs can talk without traversing the public internet.
Supported	- All subnet routes are exchanged. - Optionally, custom static routes can be imported/exported. - Reduces egress costs, latency, and improves security.
Restrictions	- No transitive peering (A↔B, B↔C doesn’t imply A↔C). - Subnet IP ranges must not overlap. - Each side must configure the peering, must be “active” on both sides.
Separate Admins	Each VPC is managed independently (its own firewall rules, routes, etc.). Peering simply provides private connectivity.
Demo Steps	1. Create two custom VPC networks (e.g., NetA, NetB) in separate projects. 2. Create VMs in each (firewall rules to allow SSH/ICMP). 3. Under VPC Peering, create connection from NetA to NetB, then from NetB to NetA. 4. Test connectivity by pinging internal IPs.

5.10 Shared VPC

Concept	Definition / Explanation
Shared VPC	Lets multiple projects share a common VPC network in a “host project.” Service projects attach to the host’s shared VPC. Instances in service projects get IP addresses from the host’s shared subnets.
Host Project	Contains the shared VPC network (one or more). Must belong to an organization. Administrators of the host project can grant subnets to service projects.
Service Project	Project “attached” to a host project’s shared VPC. VMs created in the service project can use subnets from the host’s shared network.
Roles	- Shared VPC Admin: Can enable host projects, attach service projects, and delegate subnet usage. - Service Project Admin: Manage resources in the service project. May have project-level or subnet-level usage permissions on the host project.
Use Cases	1. Simple Shared VPC: Single host project with multiple service projects. 2. Multiple Host Projects: e.g., dev vs. prod. 3. Hybrid: On-prem + host project with shared subnets. 4. Multi-tier: Different service projects for web vs. back-end tiers.
Standalone Project	Neither a host nor a service project. Uses its own VPC as normal.

5.11 VPC Flow Logs

Concept	Definition / Explanation
VPC Flow Logs	Captures a sample of network flows to and from VM instances (including GKE nodes). Used for real-time visibility into traffic, forensics, capacity planning, cost optimization, etc.
Enable on Subnet	Flow logs are enabled on a per-subnet basis. All VMs in that subnet then produce flow logs in real time.
Sampling Rate	Approximately 1 out of every 10 packets is captured. The sampling rate is set by Google Cloud and cannot be changed.
Data Export	- Cloud Logging for 30 days (by default). - Can export logs to Cloud Storage for longer retention or to BigQuery for analysis.
Use Cases	- Network Monitoring (throughput, performance). - Real-Time Security (send logs to SIEM systems, detect anomalies). - Forensics (trace suspicious IP traffic). - Cost / Capacity (see traffic flows, optimize egress).
Log Format	- Base fields (always included) plus optional metadata fields (e.g., GKE annotations). - Can filter logs to only store what you need. - Viewed in Cloud Logging (classic/preview logs viewer).

5.12 DNS Fundamentals

Concept	Definition / Explanation
Domain Name System (DNS)	A global, hierarchical, distributed database for translating human-friendly domain names (e.g., google.com) into IP addresses (e.g., 172.217.x.x).
Root (.)	The top of the DNS hierarchy—13 DNS root servers each respond for TLD references (like .com, .net).
Top-Level Domain (TLD)	E.g., .com, .org, .net (generic TLD), or .uk, .ca (country-code TLD). TLD name servers point to authoritative name servers for your domain.
Second-Level Domains	Your registered domain (e.g., bowtieinc.co). Often purchased via a domain registrar. Typically delegates to subdomains (e.g., dev.bowtieinc.co).
DNS Resolver	A server (often ISP-provided) that recursively queries DNS on your behalf. Caches results based on TTL to speed up subsequent requests.
Zone File	Contains DNS records (A, AAAA, MX, CNAME, etc.) for a domain (a zone). Hosted by an authoritative name server.
Caching & TTL	Resolvers store DNS records in memory for a “time to live” period. Low TTL = more frequent updates but more queries. High TTL = fewer queries, but changes propagate slower.
Lookup Steps	1. Client queries DNS resolver. 2. Resolver contacts root name server, then TLD server, then authoritative server. 3. Authoritative server returns IP address. 4. Resolver caches result, returns to client.

5.13 DNS Record Types

Record Type	Definition / Explanation
NS (Name Server)	Specifies the authoritative name servers for a domain. E.g., the domain’s DNS is served by `ns1.example.com`, `ns2.example.com`.
A / AAAA	- A: Maps a domain to an IPv4 address. - AAAA: Maps a domain to an IPv6 address.
CNAME	Canonical name record. Points one domain name to another. E.g., `ftp.bowtieinc.co` → `bowtieinc.co`.
TXT	Holds arbitrary text data, often used for domain ownership verification (e.g., Google Workspace), SPF/DKIM records for email security, or other meta info.
MX (Mail eXchange)	Specifies mail server(s) for handling email for a domain. Includes a priority value (lower = higher priority). E.g., `bowtieinc.co. IN MX 10 mail.bowtieinc.co.`
PTR (Pointer)	Reverse DNS record. Maps an IP address back to a domain name. Stored in special `.in-addr.arpa` (IPv4) or `.ip6.arpa` (IPv6) zones. Often used for logging, spam checks.
SOA (Start of Authority)	Holds zone-level data, e.g., admin email, serial number, and refresh intervals. Exactly one per zone. Ensures correctness and zone authority.

5.14 Network Address Translation (NAT)

Concept	Definition / Explanation
NAT	Translates private (RFC 1918) IP addresses to a public IP (or pool of public IPs) to enable internet-bound traffic. Also can hide real IP addresses for security.
Static NAT (1-to-1)	A private IP is permanently mapped to a single public IP. Outbound & inbound traffic can occur using that mapped public IP. Often used if a device must be reachable externally on a stable IP.
Dynamic NAT (Many-to-Few)	A pool of public IPs is shared among private addresses. IPs from the pool are allocated on demand. Released back into the pool after usage.
PAT (Port Address Translation) (Many-to-1)	Multiple private IPs share a single public IP. NAT device uses unique source ports to track connections. E.g., typical home router scenario, also used by GCP’s Cloud NAT.
Use Cases	- Dealing with limited public IP addresses. - Securing private networks from direct internet exposure. - Common home/office router scenario.

5.15 Cloud DNS

Concept	Definition / Explanation
Cloud DNS	Google’s managed authoritative DNS service. Fully distributed, high availability (globally). Manages DNS zones & records for domains.
Public vs. Private Zones	- Public: DNS data is visible over the internet. Typically used for external domain hosting. - Private: DNS data accessible only from within your VPC network(s).
Managed Zones	A “DNS zone” hosted by Google’s DNS name servers. You create records (A, CNAME, MX, etc.) for your domain. - Public zone usage requires domain purchase from a registrar (not provided by Cloud DNS).
Authoritative Name Servers	Cloud DNS automatically allocates name servers for your zone. You update your domain’s NS records at your registrar to point to these.
Records & Record Sets	Within a zone, you define resource record sets (e.g., `www` → A record). An “SOA” and “NS” record are created by default.
Usage	- Host a public DNS domain (point domain registrar’s NS to Cloud DNS). - Host a private DNS zone for internal name resolution (works only within your VPC or with DNS peering).

5.16 Google Cloud Connectivity and Security Services

Concept	Definition
Private Google Access	Enables VM instances without external IPs to access Google APIs and services through internal connectivity.
Private Service Connect	Provides internal endpoints in a VPC network to privately connect to managed services using internal IP addresses.
VPC Service Controls	Enhances security by establishing a security perimeter around Google Cloud resources to mitigate data exfiltration risks.
Serverless VPC Access	Allows serverless environments to securely connect to VPC networks using internal IP addresses without requiring a public IP connection.

6. Compute Engine

6.1 Virtualization Fundamentals

Concept	Definition / Explanation
Bare Metal Model	One OS running directly on the server hardware. Typically not flexible, underutilizes resources.
Hypervisor	A software layer (also called a VMM—Virtual Machine Monitor) that enables multiple OSes (VMs) to share and manage the same host hardware.
Full Virtualization	Emulates all hardware in software. Early approaches used binary translation, which was slow.
Para-Virtualization	Modified guest OS communicates directly with hypervisor (no full emulation). Improves performance, but requires guest OS changes.
Hardware-Assisted Virtualization	Modern CPUs have virtualization extensions (Intel VT-x, AMD-V). Hypervisor leverages these to run unmodified OSes efficiently. Reduces overhead (no heavy binary translation).
Kernel-Level Virtualization	The hypervisor is part of the OS kernel itself (for example, KVM on Linux). VMs are treated like user-space processes. This approach powers GCP’s Compute Engine (with nested virtualization support).
Nested Virtualization	Running a hypervisor (and VMs) inside another VM. Google’s kernel-level virtualization approach supports this. Useful for migrating on-prem VM images without big changes.
Benefits	- Better Resource Utilization (multiple OSes on same hardware). - Isolation (one VM crash doesn’t affect others). - Flexibility (spin up VMs on demand).

6.2 Compute Engine Overview

Concept	Definition / Explanation
Compute Engine	Google Cloud’s IaaS offering for running VMs (“instances”). Google manages the underlying hardware, data centers, networking, etc.
VPC Integration	Instances live in a VPC subnet. Must choose region/zone upon creation, attach disk, and configure networking.
Pricing	Pay per second (minimum 1 minute). Sustained use discounts or Committed use discounts can reduce cost.
Core Configuration	1. Machine Type (vCPU + memory). 2. OS Image (public, custom, marketplace). 3. Disk Type (standard, SSD, local SSD). 4. Network (VPC, subnets, firewall rules).
Multitenant vs. Sole-Tenant	- Multitenant: Default. The physical host is shared with other customers. - Sole Tenant: Dedicated physical host (for compliance or performance reasons) at higher cost.

6.3 Creating a VM Instance (Demo Highlights)

Action	Definition / Explanation
Name & Labels	- Name: Unique within the project. - Labels: Key-value pairs to help organize resources (e.g. env=dev).
Region & Zone	Choose a region, then a zone for your VM. Once picked, cannot be moved.
Machine Configuration	- Choose from pre-defined (general purpose, compute/mem optimized) or custom machine types. - Possibly add GPUs (for n1 type).
Boot Disk	- Select OS image from “public images,” custom images, or marketplace solutions. - Choose disk type (standard, balanced, SSD) and size.
Management / Security / Networking	- Management: Add startup scripts, availability policies, etc. - Security: Shielded VM, OS Login, disabling project-wide SSH keys. - Networking: Subnet, external IP, network tags, etc.
SSH / RDP	If Linux: typically SSH on port 22. If Windows: RDP on port 3389. Must have firewall rules to allow inbound traffic.

6.4 Compute Engine Machine Types

Machine Family	Definition / Explanation
General Purpose	Balanced CPU/memory. Good for a wide variety of workloads like web apps, small/medium DBs, dev/test, etc. Families: E2, N1, N2, N2D.
Compute Optimized (C2)	Highest performance per core, ideal for compute-intensive tasks (HPC, gaming, single-threaded workloads). Only available as predefined machine types.
Memory Optimized (M1/M2)	Ultra-high memory for large in-memory DBs (SAP HANA) or analytics. Up to 12 TB RAM. Only available as predefined.
Predefined Types	Google provides a set of standard shapes (e.g., n2-standard-4). Also have high-memory or high-CPU variants.
Custom Machine Types	Define your own vCPUs & memory (within limits). Available for general-purpose families (e2, n1, n2, n2d). Perfect if pre-defined types don’t match your ratio needs. Slightly higher cost than an equivalent pre-defined type.
Shared-Core Types	F1-micro, G1-small (N1) or e2-micro/small/medium. These use fractional CPU allocation & can burst CPU usage occasionally. Low-cost, best for small workloads, dev/test, or rarely used services.
GPUs	Attach NVIDIA GPUs (e.g., Tesla K80, P100, etc.) only on n1 machine types. Used for ML training, HPC, or 3D rendering.

6.5 Managing Instances

Concept	Definition / Explanation
Instance Lifecycle	- Provisioning → Staging → Running → (stop/suspend/terminate). - Paying for CPU & memory only when in Running or Repair states. You still pay for attached disks/IPs even if suspended/stopped.
Stopping / Suspending	- Stop: Shuts down the OS. Then transitions to Terminated. Does not incur CPU cost, but you pay for static IP & disks. - Suspend: Similar to “close laptop lid.” VM state & memory is preserved, but you still pay for disk & IP.
Live Migration	GCP can move your running VM to another host during maintenance without reboot. You can also do manual cross-zone moves within a region.
Shielded VMs	Ensures verifiable integrity of the VM’s boot sequence. Components: secure boot, vTPM, measured boot, integrity monitoring. Prevents low-level rootkits or boot malware.
Guest Environment	Scripts & daemons installed in the OS that handle instance setup (metadata, ssh key injection, etc.). Public images come with it by default. For custom images, you may need to install it yourself.
Metadata & Startup Scripts	- Metadata: Key-value pairs accessible via http://metadata.google.internal. - Startup/Shutdown Scripts: Scripts set in metadata or instance config to run on VM boot / shutdown.
OS Login	An alternative to managing SSH keys in instance/project metadata. Ties SSH access to IAM roles. Allows 2FA for SSH.
Windows Login	Use “Set Windows Password” to generate credentials. Connect via RDP on port 3389. Alternatively, use OS Login for Windows if you want.

6.6 Connecting to Your Instances (Demo Highlights)

Action	Definition / Explanation
Linux SSH	- Via Console: “SSH in Browser.” - Via Cloud Shell or local gcloud: `gcloud compute ssh instance-name`. - OS Login recommended for user management.
Windows RDP	- Enable RDP inbound firewall rule on port 3389. - Use “Set Windows Password” in the console or `gcloud`. - Then RDP with IP, username, password.
SSH Key Management	- If not using OS Login, store public keys in instance or project metadata. - Possibly block project-wide SSH keys if you want instance-level only.
Powershell Remoting (WinRM)	- If using remote PowerShell on Windows, open port 5986. - Provide credentials.
Browser-based	- “Open in browser window” for quick SSH. - For Windows, a “Chrome RDP” extension or .rdp file.

6.7 Metadata & Startup Scripts

Concept	Definition / Explanation
Instance & Project Metadata	- Metadata is stored as key-value pairs, accessible within GCP via http://metadata.google.internal/. - There are default metadata (e.g., instance name, zone) + custom metadata (user-defined) at project or instance level. - Default metadata is always present; custom metadata can be set in the console, CLI, or API.
Startup & Shutdown Scripts	- Startup Scripts run on VM boot (e.g., install packages, configure software). - Shutdown Scripts run on VM shutdown (e.g., cleanup tasks, exporting logs). - Stored in metadata (key: `startup-script` or `shutdown-script`), or in a file that references a Cloud Storage URL.
Use Cases	- Dynamic config: e.g., pass parameters to a startup script using metadata. - Automated installs & updates. - Automatic data exports on shutdown.
Metadata Queries	- Use `curl` or `wget` with the special header `Metadata-Flavor: Google`. - Endpoints: `/computeMetadata/v1/instance/...` or `/computeMetadata/v1/project/...`. - Common queries: instance name, zone, custom metadata (under `/attributes`).
Block Project-Wide SSH Keys	- Instance metadata can override project-wide keys. - Checking “Block project-wide SSH keys” means only keys in that instance’s metadata or OS Login apply.

6.8 Compute Engine Billing

Concept	Definition / Explanation
Resource-Based Billing	- vCPU, memory, disk, etc. are each billed individually. - On-demand usage billed per second (1-minute minimum).
Instance Uptime	- Billed while instance is running (or in “repair” state). - Stopped/suspended instances do not incur vCPU/memory costs, but disks and static IPs still accrue charges.
Reservations	- You can reserve VM resources in a zone for future use. - Pay on-demand rates while reserved. - Ensures capacity is always available to you. - Still eligible for sustained/committed use discounts.
Sustained Use Discounts	- Automatic discounts for running certain VMs for a significant fraction of the month. - Up to 30% off for N1 (vCPU + memory), 20% for N2/N2D/C2. - Combine usage across same region & VM type for bigger discount.
Committed Use Discounts	- 1-year or 3-year commitment for vCPU/memory/gpu. - Up to 57% (or 70% for memory-optimized) discount. - Pay monthly whether you use it or not. - If usage > commitment, extra is at on-demand rate. - E2, N1, N2, N2D, C2 are supported.
Preemptible VMs	- Up to 80% cheaper than on-demand. - Compute Engine can shut down (preempt) your VM at any time, and definitely after 24 hours. - Ideal for batch or fault-tolerant workloads. - No SLA, no live migrate, no automatic restart.
Spot VMs	- Cost-effective alternative to on-demand VMs, often up to 60-91% cheaper. - Compute Engine can reclaim Spot VMs at any time when needed for other workloads. - No 24-hour limit like Preemptible VMs, allowing longer runtimes if capacity remains available. - Ideal for batch processing, machine learning training, CI/CD jobs, and fault-tolerant workloads. - No SLA, no live migration, no automatic restart, but can be combined with managed instance groups for resiliency.

6.9 Storage Fundamentals

Concept	Definition / Explanation
Block Storage	- Data split into evenly sized blocks, each with unique ID. - Presented to OS as raw volume/hard drive. - Fastest type, often used as boot volumes (e.g., persistent disks, local SSD on Compute Engine).
File Storage	- Data structured in hierarchical directories (files/folders). - Already structured, usually network-attached (e.g., NFS). - In GCP, provided via Filestore service (not bootable, purely shared file storage).
Object Storage	- Data stored as “objects” with metadata + unique ID. - Infinitely scalable, often used for unstructured data (images, logs). - In GCP, Cloud Storage is object storage (flat namespace, not directly bootable, but can be FUSE-mounted).
Performance Terms	- I/O (IO): Single read/write request. - Queue Depth: # of IOs pending. - IOPS: IO operations per second. - Throughput: data transfer speed (MB/s). - Latency: time for each IO to complete (ms). - Sequential vs. Random: large sequential vs. scattered small ops.

6.10 Persistent Disk & Local SSDs

6.10.1 Persistent Disks

Concept	Definition / Explanation
PD Types	1. Standard (pd-standard): Backed by HDD, cheapest, best for sequential IO (large reads/writes). 2. Balanced (pd-balanced): Mid-tier cost/performance, good general-purpose option. 3. SSD (pd-ssd): Fastest PD type with low latency, high IOPS, higher cost.
Zonal vs. Regional	- Zonal: Resides in a single zone. - Regional: Synchronously replicated across two zones in same region for higher availability. - Regional is slower & more expensive but more fault-tolerant.
PD Characteristics	- Max 64 TB per disk. - Disks are network-attached (not physically attached). - Resizable online (bigger only). - Encrypted at rest (can use default or customer-managed keys). - Independent lifecycle from the VM (attach/detach, keep disk after VM delete).
PD Performance	- Performance scales with disk size & vCPU count. - Must have enough vCPUs to drive desired IOPS. - For standard/balanced/SSD PD: the bigger the disk, the higher the IOPS/throughput.
Snapshot	- Incremental backups at block level. - Typically used for zonal PD to replicate data or keep backups. - Snapshots can be stored in multi-regions or single region.

6.10.2 Local SSD

Concept	Definition / Explanation
Local SSD	- Physically attached to the host server. - Highest IOPS & lowest latency. - Limited to 24 x 375 GB partitions = max 9 TB.
Volatile Data	- Data is lost when VM is stopped, deleted, or moved. - Good for caches, scratch data, or ephemeral workloads.
NVMe vs. SCSI	- SCSI is older, single queue. - NVMe (non-volatile memory express) is newer, supports many queues & commands, typically offers higher IOPS/throughput.
Availability	- Only for N1, N2, and compute-optimized VM families. - Not attachable/detachable. Must be chosen at instance creation.
Performance	- Very high read/write ops (millions of IOPS). - Lower latency than PD.

6.11 Managing Disks on Compute Engine

Action	Definition / Explanation
Create a Persistent Disk	- Zonal or Regional. - Blank or from image/snapshot. - Choose type (standard, balanced, SSD) & size.
Attach/Detach	- Attach disk to a running VM or a stopped VM (except local SSD). - On Linux, must format + mount. - On Windows, must initialize in Disk Management.
Resizing a Persistent Disk	- Disks can be expanded without downtime. - Must resize the filesystem inside the OS (e.g., `resize2fs` on Linux).
Mounting & FSTAB	- After formatting, create mount point & add entry to `/etc/fstab` (Linux) for auto-mount on reboot.
Data Persistence	- PD remains intact even if VM is deleted (unless “delete disk” is selected). - Local SSD data is lost on VM stop/delete.
Deleting Disks	- Must detach first from a running VM (or delete VM if boot disk). - Freed resources stop incurring cost.
Snapshot	- Create from a disk for backup or migration. - Snapshots are incremental, can restore to new disk or instance.

6.12 Snapshots

Concept	Definition / Explanation
Snapshot Fundamentals	- Snapshots are incremental, point-in-time backups of persistent disks (zonal or regional). - Snapshots can be taken from running or stopped instances (disks do not have to be detached). - They are global resources: can create new disks in any region from a snapshot.
Location & Storage	- Stored in Cloud Storage. Choose multi-regional (higher availability, higher cost) or regional (lower cost, but limited to a single region). - If snapshot region = disk region, no network charge for snapshot/restore in that region.
Incremental & Compression	- The first snapshot is a full snapshot of the disk. - Subsequent snapshots only store changed or new blocks since the last successful snapshot. - Snapshots are compressed automatically.
Deleting Snapshots	- Deleting a snapshot does not necessarily remove all of its data if other snapshots depend on it. - Deployment Manager will manage references among snapshots, adjusting block references as needed (some blocks may move to next snapshot).
Frequency & Best Practices	- Minimum 10 minutes between snapshots of the same disk. - Regular snapshots reduce data-loss risk. - Off-peak snapshot times = faster & cheaper if data changes are fewer. - For Windows: Volume Shadow Copy Service (VSS) can be used for consistent snapshots.

6.13 Creating Snapshots & Snapshot Schedules

Concept	Definition / Explanation
Manual Snapshots	- One-off snapshots can be created from the console or CLI. - Must specify the source disk, snapshot name, and region or multi-region storage location. - Snapshots are incremental, so repeated snapshots are quick & cost less.
Snapshot Schedules	- Automate periodic snapshots of a given disk. - One schedule per disk; must be in the same region as the disk. - Optionally define snapshot retention (e.g., “keep 14 days”), source disk deletion rule. - Attachable or detachable from a disk.
Manage Snapshots & Schedules	- You can detach a schedule or delete it after detaching from all disks. - Schedules cannot be edited. Instead, remove and re-create with different settings. - Snapshots remain until manually deleted or retention policy cleans them up.
Creating a Disk from Snapshot	- Create new disk in any region from an existing snapshot. - The new disk can then be attached to a VM as a data disk or a boot disk (if snapshot is from a bootable disk).

6.14 Deployment Manager

6.14.1 Overview

Concept	Definition / Explanation
Deployment Manager	- GCP’s infrastructure-as-code tool for automating resource provisioning. - Uses YAML for configurations + optional Jinja or Python templates. - Deploy, update, and delete resources in a single, repeatable workflow.
Key Components	- Configuration: The main YAML file describing resources. - Template(s): Reusable building blocks (Jinja/Python). - Deployment: A collection of resources managed together.
Resource Types	- Base types: e.g., `compute.v1.instance`. - Composite types: e.g., `gcp-types/compute-v1:instances` (bundled sets of resources).
Properties & References	- Properties: Parameters for the resource. Must match the specific API’s fields (e.g., machineType, network). - References: Let one resource read values from another resource (e.g., `$(ref.resourceName.selfLink)`).
Manifests	- Read-only descriptor for each deployment. Auto-created when you deploy. - Summarizes expanded config + resources (similar to a “compiled” version).

6.14.2 Usage & Best Practices

Concept	Definition / Explanation
Workflow	1. Write config (`.yaml`) + optional templates (`.jinja`/`.py`). 2. Preview (`--preview`) or deploy (`gcloud deployment-manager deployments create ...`). 3. Update (`gcloud deployment-manager deployments update ...`) or delete resources.
Preview Mode	- Doesn’t provision any resources. - Helps catch errors in your config or templates before real deployment.
Templates	- Split configurations into smaller re-usable `.jinja` or `.py` files. - Use environment variables + custom properties to handle dynamic values.
References	- Use `$(ref.myResource.property)` to refer to output from one resource as an input to another. - Ensures correct order of creation (dependency).
Best Practices	- Keep separate config for major categories (e.g., network vs. compute vs. security). - Always preview changes. - Use version control (Git) + automation (CICD). - Use references to handle resource dependencies. - Automate project creation if needed.

7. High Availability and Autoscaling

7.1 Cloud Load Balancers (CLB)

Concept	Definition / Explanation
Load Balancer Purpose	Distribute traffic across multiple resources (VMs, instance groups, etc.) to increase availability, reduce latency, and improve overall user experience.
Software-Defined & Global	GCP load balancers are fully software-defined, no hardware devices needed. Certain GCP load balancers can be global (premium tier) or regional (standard tier).
Forwarding Rule	Directs traffic (based on protocol and port) to a target backend (e.g., backend service or target pool).
Backend Service	Defines how the load balancer distributes traffic to back ends. Contains settings like health checks, session affinity, timeout, and references to instance groups or NEGs as back ends.

7.1.1 Types of CLB

LB Type	Key Characteristics
HTTP(S) Load Balancing	- Global, Layer 7 (application). - Proxy-based: terminates HTTP(S) traffic at Google Front Ends (GFE). - Supports cross-region distribution & content-based routing (URL maps). - IPv4 & IPv6, IPv6 terminates at LB → forwards IPv4 to backend. - Premium-tier = global; standard-tier = regional.
SSL Proxy Load Balancing	- Global, Layer 4 (TCP over SSL). - Terminates SSL at LB, re-encrypt or pass plain TCP to backend. - IPv4 & IPv6, IPv6 terminates at LB. - Only supports TCP with SSL (proxy).
TCP Proxy Load Balancing	- Global, Layer 4 (TCP). - Proxy-based: Terminates TCP at LB, can re-establish TCP or SSL to backend. - IPv4 & IPv6, IPv6 terminates at LB.
Network Load Balancing (NLB)	- Regional, Layer 4 (TCP/UDP). - Pass-through: no termination, direct server return. - Balances TCP, UDP, SSL (self-managed). - Great for non-HTTP protocols needing direct IP:port LB.
Internal Load Balancing (ILB)	- Regional, Layer 4 (TCP/UDP). - Internal only, not internet-facing. - Balances traffic within a VPC (private IP addresses).

7.2 Instance Groups and Instance Templates

7.2.1 Instance Templates

Concept	Definition / Explanation
Instance Template	A resource that defines a VM’s configuration (machine type, disk, metadata, etc.). Re-used to create multiple VMs or Managed Instance Groups (MIGs).
No Editing Templates	Once created, cannot edit. Must create a new template if a config changes.
Usage	- Create MIG using an instance template. - Optionally base a new template on an existing instance. - Includes OS images (public/custom), metadata, machine type, disks, network, etc.

7.2.2 Managed Instance Groups (MIGs)

Concept	Definition / Explanation
MIG Overview	- A fleet of identical VMs (stateless recommended). - MIG automatically handles scaling, healing, auto-updates, multi-zone/regional deployments, etc. - Must use instance templates to create identical VMs.
Auto Healing	- Uses MIG health checks to replace unhealthy instances automatically. - Distinct from LB health checks (which only remove from traffic, not recreate).
Auto Scaling	- Automatically add/remove VMs to match load (CPU utilization, custom metrics, LB-based). - Scales in to reduce cost, out to handle demand.
Rolling Updates	- Update MIG with minimal downtime (gradual replacement). - Optionally do canary (partial rollout) with controlled pace.
Regional vs. Zonal MIG	- Regional MIG: Distribute instances across multiple zones in the same region (higher availability). - Zonal MIG: All instances in a single zone.
Preemptible VMs	- MIG can include preemptible instances for cost savings. - Auto healing replaces them if capacity is available when preempted.
Stateful MIG	- Keep per-instance state (e.g., persistent disk, instance name). - Useful for partial stateful apps or unique configs, but MIG still handles auto healing.

7.2.3 Unmanaged Instance Groups

Concept	Definition / Explanation
Unmanaged Instance Group	- Heterogeneous (mixed machine types, OS, etc.). - No auto scaling, auto healing, or rolling updates. - Use only if you need load balancing across a custom set of distinct instances that you manage manually.
No Templates	- You add existing instances to an unmanaged group. You handle all lifecycle events.

8. Kubernetes Engine Containers

8.1 Introduction to Containers

Concept	Definition / Explanation	Key CLI Commands
Containers	Lightweight application bundles containing all dependencies. Share OS kernel while isolating processes in each container.	(Docker commands, for reference) `docker build -t [image:tag] .` `docker run -p 80:80 [image:tag]`
Container Registry	A storage system for container images (public or private). GCP provides Artifact Registry or Container Registry.	`gcloud artifacts repositories create ...` `gcloud container images list` (for older Container Registry usage)
Dockerfile Layers	Each line in a Dockerfile forms a new read-only layer. Final container = all layers + top read-write layer at runtime.	N/A in GCP CLI, but essential for Docker
Pods	In pure Kubernetes, 1+ containers in a single deployable object. GKE always runs containers inside pods.	`kubectl run myapp --image=...` (auto-creates a single-pod Deployment in newer K8s versions)

8.2 GKE and Kubernetes Concepts

Concept	Definition / Explanation	Key CLI Commands
Kubernetes	An open-source container orchestration platform. Automates scheduling, scaling, networking for containerized apps.	`kubectl` commands to interact with any Kubernetes cluster.
GKE (Google Kubernetes)	Managed environment on GCP for Kubernetes clusters. GCP manages the control plane (master) while you control node configs.	`gcloud container clusters create [CLUSTER_NAME] --zone ...` `gcloud container clusters list` `kubectl` for cluster interactions.
Control Plane	Composed of API Server (kube-apiserver), Scheduler (kube-scheduler), Controller Manager (kube-controller-manager), etcd. Coordinates cluster state. GKE manages these for you.	GKE auto-manages control plane, no direct `gcloud` to manage it.
Nodes	Worker machines (Compute Engine VMs in GKE). Run container runtime (Docker/Containerd) + kubelet (agent).	Created automatically by `gcloud container clusters create`.
Node Pools	Group of nodes sharing configuration (machine type, size, disk, etc.). You can have multiple node pools in a cluster for different workloads.	`gcloud container node-pools create [POOL_NAME] --cluster [CLUSTER_NAME]` `gcloud container node-pools list --cluster [CLUSTER_NAME]`
Namespaces	Virtual clusters within a physical cluster. Isolate apps or teams. Pre-defined namespaces: default, kube-system, kube-public, kube-node-lease.	`kubectl get namespaces` `kubectl create namespace [NAME]` `kubectl apply -f myapp.yaml --namespace [NAME]`

8.3 Cluster and Node Management

Concept	Definition / Explanation	Key CLI Commands
Cluster Types	- Zonal (single-zone or multi-zonal) - Regional (multi-zone control plane replicas) Multi-zonal & regional = better availability, typically higher cost.	`gcloud container clusters create [CLUSTER_NAME] --region/us-east1 --enable-autorepair` `gcloud container clusters create ... --num-nodes=...`
Private Clusters	Nodes do not have public IPs, only internal addresses. Control-plane can optionally disable public endpoint. More secure, more steps for connectivity (VPC peering, NAT if needed).	`gcloud container clusters create [CLUSTER_NAME] --enable-private-nodes --master-ipv4-cidr ...` `gcloud container clusters update ... --enable-master-authorized-networks ...`
Release Channels	Automatic cluster version upgrades, stability tiers. Rapid, Regular, Stable.	`gcloud container clusters create [CLUSTER_NAME] --release-channel=regular` `gcloud container clusters update [CLUSTER_NAME] --release-channel=stable`
Auto-Upgrades	GKE can automatically upgrade control plane & nodes to newer patch versions. Minimizes manual overhead.	`gcloud container clusters create [CLUSTER_NAME] --enable-autoupgrade` `gcloud container node-pools update [POOL_NAME] --enable-autoupgrade --cluster=[CLUSTER_NAME]`
Manual Upgrades	You can pin cluster version & manually do `gcloud container clusters upgrade ...`. Only recommended if you have custom reasons to test each version.	`gcloud container clusters upgrade [CLUSTER_NAME] --cluster-version=...` `gcloud container node-pools upgrade [POOL_NAME] --cluster [CLUSTER_NAME]`
Surge Upgrades	Controls how many nodes GKE upgrades in parallel (max-surge-upgrade) and how many can be temporarily unavailable (max-unavailable-upgrade). Reduces downtime at cost of extra nodes.	`gcloud container node-pools update [POOL_NAME] --cluster=[CLUSTER_NAME] --max-surge-upgrade=2 --max-unavailable-upgrade=1`

8.4 Pods and Object Management

Concept	Definition / Explanation	Key CLI Commands
Pods	Smallest K8s object. Runs 1 or more containers. Ephemeral & disposable. Typically created/managed by higher-level objects like Deployment.	`kubectl get pods` `kubectl describe pod [NAME]` `kubectl logs [POD_NAME]`
Pod Spec & Status	The .spec in the manifest states container specs (image, ports, volumes, etc.). The .status is updated by the K8s system.	`kubectl apply -f pod.yaml`
Deployments	Higher-level object that manages sets of replicated Pods (ReplicaSets). Handles rolling updates, rollback, scale. Great for stateless apps.	`kubectl create deployment [NAME] --image=...` `kubectl scale deployment [NAME] --replicas=...` `kubectl rollout undo deployment [NAME]`
ReplicaSets	Ensures a specified number of pod replicas are running. Usually handled by Deployment.	Typically not managed directly; used behind the scenes by Deployments.
StatefulSet	For stateful apps requiring persistent identity (like DBs). Retains pod identity across rescheduling.	`kubectl apply -f statefulset.yaml`
DaemonSet	Ensures a pod runs on every node (logging/monitoring agents).	`kubectl apply -f daemonset.yaml`
Jobs & CronJobs	- Job runs a finite task to completion (batch work). - CronJob is a scheduled repeating job.	`kubectl create job [NAME] --image=... -- [args]` `kubectl create cronjob [NAME] --schedule="0 * * * *" --image=... -- [args]`
ConfigMaps & Secrets	Externalize config data or secret data. Mounted as env vars or volumes. Secrets are base64-encoded.	`kubectl create configmap [NAME] --from-literal=KEY=VALUE` `kubectl create secret generic [NAME] --from-literal=KEY=VALUE`
Config Connector	A Kubernetes-based tool that manages Google Cloud resources using Kubernetes configuration files.

8.5 Kubernetes Services

Concept	Definition / Explanation	Key CLI Commands
Service	Stable, persistent endpoint to access a set of pods. Each service gets a stable IP + DNS name (internal or external). Pods behind the service are dynamically updated using selectors (labels).	- kubectl get services - kubectl describe service [NAME] - kubectl expose deployment [DEPLOYMENT_NAME] --type=... --port=...
Selector & Labels	Services route traffic to pods matching a label. Example: app: inventory. A label must match the service selector to register the pod behind that service.	- kubectl apply -f service.yaml (the spec.selector must match pods’ metadata.labels)
ClusterIP (default)	Internal-only (virtual IP) accessible within the cluster. No external exposure.	- kubectl expose deployment [NAME] --type=ClusterIP --port=...
NodePort	Exposes the service on a static port on each node (30000–32767). Access from outside by :.	- kubectl expose deployment [NAME] --type=NodePort --port=80 --node-port=30080
LoadBalancer	Provisions a cloud LB (e.g., GCP external load balancer). Traffic from LB -> NodePort -> Pods. Simplest way to get external IP if you want each service behind a separate LB.	- kubectl expose deployment [NAME] --type=LoadBalancer --port=80
Multi-Port Services	Service can map multiple ports (or port + targetPort pairs). Each port must have a unique name in the service spec.	- kubectl apply -f multiport-service.yaml
ExternalName	Maps service DNS to an external DNS name. No cluster IP or pods. Simple CNAME-like alias.	- kubectl apply -f externalname.yaml
Headless Service	spec.clusterIP: None. No clusterIP assigned. Allows direct pod endpoints discovery, often with StatefulSets.	- kubectl apply -f headless-service.yaml

8.6 Ingress for GKE

Concept	Definition / Explanation	Key CLI Commands
Ingress	High-level object that defines HTTP(S) routing rules for multiple Services. GKE implements Ingress via GCP’s HTTP/HTTPS Load Balancer. One IP can serve multiple paths/hosts.	- kubectl apply -f ingress.yaml - kubectl get ingress
Ingress Controller	In GKE, the built-in controller maps Ingress resources to a Google Cloud HTTP(S) LB.	- kubectl describe ingress [NAME]
Ingress Rules	Map host/path -> backend service in cluster. Example: /discontinued routes to Service discontinued-service.	- In ingress.yaml, under .spec.rules[].http.paths[].backend.serviceName or backend.service.
NEG (Network Endpoint Group)	Container-native LB: each pod is an endpoint. LB routes traffic directly to pod IP. More fine-grained than standard Service NodePort.	- kubectl expose deployment [NAME] --port=80 --type=ClusterIP -o yaml --dry-run=client and annotate for NEGs.
GCP SSL Certificates	Ingress can reference either Google-managed or self-managed SSL certificates. One LB can hold multiple certs for SNI-based routing.	- gcloud compute ssl-certificates create [CERT_NAME] --certificate [CERT_FILE] --private-key [KEY_FILE] - Then kubectl annotate ingress [NAME] ...

8.7 GKE Storage Options

Concept	Definition / Explanation	Key CLI Commands
Ephemeral vs. Durable Storage	- Ephemeral: Tied to pod lifecycle (e.g., `emptyDir`). - Durable: Outlives pods, typically Persistent Disks or volumes from a storage provider.	- Pods reference ephemeral volumes (like `emptyDir`) in their manifest under `.spec.volumes[].emptyDir`. - Durable volumes are typically mounted via Persistent Volume Claims.
Kubernetes Volume	Basic storage unit in a pod. Volumes outlive containers but die with the pod (unless using persistent volumes). Examples: `emptyDir`, `configMap`, `secret`, `downwardAPI`.	- `kubectl describe pod [NAME]` shows volumes defined in `.spec.volumes`. - `kubectl get pvc` etc.
Persistent Volume (PV)	A cluster-wide resource representing a piece of durable storage in the cluster. The actual backing can be GCE persistent disks, Filestore, etc. Lifecycle managed by K8s.	- `kubectl get pv` - Usually created dynamically via Persistent Volume Claims + StorageClass.
Persistent Volume Claim (PVC)	A request for storage by a user. Binds to a suitable PV that meets the spec (e.g., size, access modes). GKE can dynamically create a persistent disk upon PVC creation if using the default or a custom StorageClass.	- `kubectl get pvc` - `kubectl apply -f pvc.yaml` Example snippet in `pvc.yaml`: `yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: my-pvc spec: accessModes: [ "ReadWriteOnce" ] resources: requests: storage: 10Gi`
StorageClass	Defines classes of storage offered in a cluster. GKE typically has a default `standard` class (and possibly `balanced` / `ssd`). Allows dynamic provisioning of persistent disks.	- `kubectl get storageclass` - `kubectl describe storageclass [CLASSNAME]` - `kubectl apply -f custom-storageclass.yaml`
Access Modes	Describes how volumes can be mounted: ReadWriteOnce (RWO), ReadOnlyMany (ROX), ReadWriteMany (RWX).	- Specified in `pvc.yaml` under `.spec.accessModes`.
Regional vs. Zonal Persistent	- Zonal PD: Resides in a single zone. - Regional PD: Replicated across two zones in the same region, for higher availability (failover if zone fails).	- Create a PVC with annotation referencing `volume-type: pd-standard` or custom SC referencing replication.
Container-Native Storage	You can use GCE persistent disks, Filestore (NFS), or Cloud Storage FUSE with GKE. Cloud SQL can also be used externally. Simplest approach is persistent disks via PVC/StorageClass.	- GCE PD is automatically used if you choose `storageClassName: standard` in your PVC. - Filestore requires an NFS-based approach or Filestore CSI driver.

8.8 Creating a GKE Cluster & Managing Box of Bowties

Concept	Definition / Explanation	Key CLI Commands
Creating a GKE Cluster	GKE managed environment for Kubernetes. You can create via the Console or via `gcloud container clusters create`. You define cluster type (zonal or regional), node machine types, release channels, etc.	- Console: "Kubernetes Engine" → "Create Cluster" → Fill details. - CLI: `gcloud container clusters create [CLUSTER_NAME] --num-nodes=3 --zone=[ZONE] --release-channel=regular ...`
Node Pools	Clusters have node pools. Each pool is a group of node VMs sharing the same configuration. You can add, remove, or update node pools without affecting the entire cluster.	- `gcloud container node-pools create [POOL_NAME] --cluster [CLUSTER_NAME] ... --num-nodes=2` - `gcloud container node-pools delete [POOL_NAME] --cluster [CLUSTER_NAME]`
Setting `kubectl`	Must retrieve the cluster’s credentials so `kubectl` can communicate with the new cluster’s control plane.	- `gcloud container clusters get-credentials [CLUSTER_NAME] --zone [ZONE]` - Check: `kubectl get nodes`, `kubectl get all`
Deploying a Container	- Option 1: Use `kubectl create deployment ... --image=...` - Option 2: Use GKE console "Deploy". Then "Expose" to create Service (Load Balancer, etc.).	- `kubectl create deployment box-of-bowties --image=gcr.io/[PROJECT]/box-of-bowties:v1.0.0` - `kubectl expose deployment box-of-bowties --type=LoadBalancer --port=80`
Scaling a Deployment	Increase/Decrease the number of pods (replicas). For zero downtime, K8s performs rolling expansion/contraction.	- Console: Workloads → "Scale". - CLI: `kubectl scale deployment box-of-bowties --replicas=3` - `kubectl get pods` (verify)
Rolling Updates	Seamless updates. Replaces old pods with new pods, one by one. Minimizes downtime.	- Console: "Workloads" → "Rolling Update" → Provide new container image digest. - CLI: `kubectl set image deployment/box-of-bowties box-of-bowties-container=gcr.io/[PROJECT]/box-of-bowties:v1.0.1 --record`
Cloud Build & Container Reg.	- Cloud Build: CI/CD service. Build Docker images from source inside GCP, push to registry. - Container Registry: Stores Docker images. GCR is integrated with GCP auth + scanning.	- `gcloud builds submit --tag gcr.io/[PROJECT]/box-of-bowties:v1.0.0 .` - `gcloud container images list-tags gcr.io/[PROJECT]/box-of-bowties` - `gcloud container images delete gcr.io/[PROJECT]/box-of-bowties:v1.0.0` (cleanup)
Cleanup	Delete resources to avoid costs: 1. Delete the LB Service 2. Delete the Deployment 3. Delete the Container Images 4. Delete GCS build artifacts 5. Delete GKE cluster.	- LB/Service: `kubectl delete service box-of-bowties-service` - Deployment: `kubectl delete deployment box-of-bowties` - Cluster: `gcloud container clusters delete [CLUSTER_NAME] --zone [ZONE]` - Images: `gcloud container images delete ...`

9. Hybrid Connectivity

9.1 Cloud VPN

Concept	Description
Purpose / Use Cases	- Securely connect an on-premises network to a VPC over an IPsec VPN tunnel. - Good if you have moderate traffic, want encryption, and can tolerate latencies of public internet. - Site-to-site only (no client VPN).
Key Features	- Encryption at L3 (IPsec). - HA VPN offers 99.99% SLA (two interfaces/two external IPs). - Classic VPN offers 99.9% SLA. (Google recommends new deployments to use HA VPN.)
Routing	- Static or dynamic routing supported (dynamic with BGP/Cloud Router). - Each HA VPN gateway interface can support multiple tunnels. - Speeds up to ~3 Gbps per tunnel.
Connectivity	- Traffic traverses public internet—but is IPsec-encrypted. - Combine with Private Google Access for on-prem hosts.
Classic vs. HA VPN	- Classic VPN: Single IP, single interface, up to 3 Gbps, 99.9% SLA. - HA VPN: Two IPs (active/active), 99.99% SLA if both interfaces used with two external IPs, dynamic routing only.

9.2 Cloud Interconnect

Concept	Description
Purpose / Use Cases	- Dedicated private connection from on-prem data center to Google’s network (no public internet). - High throughput, low latency. Great for large data volumes and production workloads needing stable connectivity.
Dedicated Interconnect	- Physical link (10 Gbps or 100 Gbps) from on-prem to Google’s colocation facility (PoP). - Up to 200 Gbps total per interconnect. - Must be in a Google-supported colocation facility. - Offers private IP routing.
Partner Interconnect	- Connect via a service provider instead of direct facility for “last-mile” connectivity. - Supports smaller increments: 50 Mbps up to 50 Gbps attachments. - Still private IP traffic, leveraging partner’s physical link.
Cloud Router + BGP	- For dynamic routing, a Cloud Router is used with (HA) VPN or Interconnect. - BGP sessions exchange routes between on-premises network and GCP VPC.
Direct vs. Partner	- Dedicated if you already have presence in colocation facility, need 10–100 Gbps per link. - Partner if you can’t reach a colocation PoP or only need smaller capacity.

10. Serverless Services

10.1 App Engine Overview

Aspect	Description
Definition	Fully-managed PaaS for hosting web apps in Google Cloud. Handles provisioning, scaling, patching. Just upload your code and let GCP do the heavy lifting.
Standard vs. Flexible	Standard: Runs in language-specific runtimes (Python, Node.js, Java, Go, etc.). Sandboxed environment, free tier available, ephemeral local disk. Flexible: Runs in Docker containers on GCE VMs, no free tier, uses OS-level access.
Scaling Types	- Automatic: Scale up/down based on load (can go to zero). - Basic: Instances start on request, shut down when idle. Good for intermittent workloads. - Manual: Specify fixed number of instances.
Services / Versions	- An App Engine app can have multiple services (like microservices). - Each service can have multiple versions (for rollbacks, testing, traffic splitting).
Traffic Management	- Traffic Migration: Move traffic from old version to new version immediately or gradually (standard environment only). - Traffic Splitting: Route percentages of traffic to each version (A/B test).
Deploying	- Typically: `gcloud app deploy [YOUR_APP_YAML]` - Distinct `app.yaml` per service.
Supported Languages	Common runtimes (Node.js, Python, Java, Go, PHP, Ruby, .NET). Custom Docker (flex environment).

10.2 Introduction to Cloud Functions

Aspect	Description
Definition	Serverless “function as a service” for single-purpose, event-driven code. GCP automatically handles provisioning, scaling, patching.
Key Features	- Supports Node.js, Python, Go, Java, .NET Core. - Integrations with HTTP triggers, or background triggers (Pub/Sub, Cloud Storage, Firestore, etc.). - Priced by execution time and invocations.
Execution Model	- Stateless: each invocation is handled by an instance of your function. - Single concurrency per instance, no parallel requests on same instance.
Triggers	HTTP (direct calls), Pub/Sub (event), Cloud Storage (file uploads/deletes), Firestore (document changes).
Deployment	- `gcloud functions deploy [FUNCTION_NAME] --runtime [LANGUAGE] --trigger-http / --trigger-bucket / --trigger-topic...` - Source code can be inline in the console or uploaded from local/Cloud Source Repos.
Networking	- By default: outgoing to internet is allowed, internal VPC not allowed unless you configure a VPC connector. - Ingress control can restrict function access to internal only or LB only.
Use Cases	- Quick data transformations, e.g. image thumbnail creation on file upload. - Asynchronous event handlers, e.g. after a Pub/Sub message. - Serverless APIs, e.g. for webhooks or form submissions.

11. Storage Services

11.1 Cloud Storage

Aspect	Description
Definition	Global, large-capacity object storage for unstructured data. Store files/objects in buckets with globally unique names.
Use Cases	- Storing large data sets (e.g., images, videos, archives). - Content distribution / direct public hosting. - Backup or big data analytics source. - Serving static website content.
Buckets	- Top-level container for objects (no nesting buckets). - Name must be globally unique. - Choose location (region, dual-region, multi-region). - Choose default storage class (Standard, Nearline, Coldline, Archive).
Objects	- Stored files in buckets, up to TBs in size. - Immutable: replaces old version, cannot edit in place. - Metadata includes object’s name, generation, etc. - No limit on number of objects.
Storage Classes	- Standard: Frequent access, ~0.02 USD/GB/mo. - Nearline (30-day min storage): Infrequent (~1x/mo) usage, ~0.01 USD/GB/mo. - Coldline (90-day min): Rarely accessed (~1x/quarter). - Archive (365-day min): ~1x/year or long-term.
Geo-Options	- Region (lowest-latency to your region). - Dual-Region (2 separate regions for HA). - Multi-Region (spreads data across a continent).
Access Control	- IAM (recommended) to manage bucket- or project-level permissions. - ACLs (fine-grained object-level control, older approach). - Signed URLs for temporary controlled access. - Signed Policy Docs for controlled uploads.
Lifecycle Management	- Automatically transition storage class (e.g., Standard→Coldline) or delete older objects. - Configured via JSON or console rules (conditions + actions).
Object Versioning	- Store older (noncurrent) versions instead of overwriting. - Increases storage cost, often combined with lifecycle rules (delete older versions after N days).
Typical Commands	- `gsutil cp`: Copy local→GCS or GCS→GCS. - `gsutil mv`: Move objects, changing generation number if versioning enabled. - `gsutil lifecycle set/get`: Manage JSON-based lifecycle rules.

11.2 Cloud SQL

Aspect	Description
Definition	Fully managed relational DB service. Supports MySQL, PostgreSQL, and SQL Server. Google handles provisioning, maintenance, backups, HA config, etc.
Storage & Scaling	- Up to 30 TB persistent disk per instance. - Choose HDD or SSD. - Automatic storage increase if enabled. - CPU, RAM sized by instance (db-* machine types).
Connectivity	- Public IP (with authorized networks) or Private IP (preferred if on same VPC). - Cloud SQL Proxy recommended (handles SSL/tunnels + IAM-based auth).
Replication	- Read Replicas (for scale-out reads) up to 10 replicas. - Cross-region or in-region replicas; can replicate to external MySQL. - Promote replica → new standalone primary (no auto failover).
High Availability	- Optionally enable HA (known as “regional” instance). - Creates synchronous standby in different zone, automatic failover → 99.95+% (varies by tier).
Backups & PITR	- Automated or on-demand backups. - Point-in-time recovery requires binary logging (must be enabled). - By default 7 days of backup retained (configurable).
Use Cases	- Traditional relational workloads needing strong ACID transactions. - Commonly used with external VM apps, GKE microservices, or serverless (using Cloud SQL Proxy).
Cost	- Billed for CPU, memory, storage, backups, egress. - Different pricing for MySQL/Postgres vs. SQL Server (license included).

11.3 Cloud Spanner

Aspect	Description
Definition	Google’s horizontal-scaling relational DB. Global, strongly consistent, highly available. 5 nines availability for multi-region.
Key Features	- Relational model (SQL interface, schemas). - Synchronous replication for strong consistency. - Auto-sharding and high throughput with “TrueTime” for global ordering. - Nodes are the capacity unit (CPU/RAM), can scale on the fly.
Use Cases	- Mission-critical, globally distributed systems needing ACID transactions at scale. - Multi-region or global apps, high throughput (10k QPS+). - e.g. financial trading, global inventory, gaming leaderboards with strong consistency.
Replication & Regions	- Multi-region = 2 or more regions + witness for 5 nines SLA. - Regional instance = 1 region, multiple zones, 4 nines SLA.
Cost	- ~0.90 USD/node/hr + storage (~0.30 USD/GB/mo). - Nodes provide CPU/RAM, can be scaled linearly.

11.4 NoSQL Databases on Google Cloud

11.4.1 Cloud Bigtable

Aspect	Description
Definition	Fully managed wide-column NoSQL database for very large scale (TB–PB) with low latency and high throughput.
Key Features	- Horizontally scalable (add more nodes for higher throughput). - Millisecond-level read/write latencies at large scale. - Integrated with Big Data / ML tools (Dataflow, Dataproc, HBase API). - Regional service; can enable multi-cluster for DR.
Common Use Cases	- Time-series data (IoT, logs, sensor readings). - Ad tech or financial data ingest at massive scale. - Recommendation engines, personalization, real-time analytics.
Cost	- ~0.65 USD/node/hr + storage usage + egress. - Not cheap, but extremely high performance at scale.

11.4.2 Cloud Datastore / Firestore in Datastore Mode

Aspect	Description
Definition	Document-based NoSQL database with automatic scaling, high performance, and SQL-like queries (GQL).
Datastore / Firestore	- Firestore is the next generation of Datastore. Existing Datastore DBs are automatically migrated. - Firestore in Datastore Mode = Datastore’s system with Firestore’s improved backend.
Key Features	- ACID transactions (document-level). - Automatic scaling, strongly consistent queries by key. - GQL for queries. - Emulator available for local dev and testing.
Use Cases	- Web/mobile user profiles, product catalogs, real-time data that needs simpler query patterns than relational.

11.4.3 Firestore (Cloud Firestore / Firebase)

Aspect	Description
Definition	Serverless document DB for mobile/web app dev, real-time sync, offline support.
Key Features	- Data in collections → documents → subcollections. - Real-time updates & offline mode for client apps. - Integrates with Firebase for mobile dev. - Automatic multi-region replication, 5 nines availability.
Common Use Cases	- Mobile/web backends with real-time sync (chat, presence, user preferences). - Offline mode, frequently changing data.

11.4.4 MemoryStore for Redis / Memcached

Aspect	Description
Definition	Fully managed in-memory data store (Redis or Memcached). Use as an application cache for high throughput & low latency data retrieval.
Key Features	- Zero server ops (scalable, self-healing). - Deployed in VPC, private IP only by default. - High availability & failover for Redis “Standard Tier.” - Great for session caching, caching frequently accessed queries, ephemeral data, etc.
Common Use Cases	- Session caching for web apps. - Leaderboards, real-time counters. - Low-latency read access to data typically stored in slower or remote DB.

11.5 Data transfer services

Service	Description	Transfer Mode	Target GCP Service
Storage Transfer Service	A fully managed online service that automates the transfer of data from external cloud storage providers or on-premises sources into Google Cloud Storage.	Online	Cloud Storage
Transfer Appliance	A secure, physical hardware device designed for moving very large volumes of data to Google Cloud Storage. It is shipped to the customer, loaded with data, then returned for ingestion.	Offline	Cloud Storage
BigQuery Data Transfer Service	A service that automates the movement of data from various external sources directly into BigQuery, helping to keep analytics data up to date.	Online	BigQuery

12. Big Data & Machine Learning

12.1 Big Data Services

Service	Type	Description	Use Cases
BigQuery	Data Warehouse	Fully managed, serverless data warehouse for real-time analytics using SQL. Supports batch and streaming data ingestion.	Business analytics, BI reporting, ML integration
Pub/Sub	Messaging Service	Global, scalable messaging middleware for real-time event streaming. Publishers send messages to topics, and subscribers pull/push messages.	IoT data streams, event-driven systems, log ingestion
Composer	Workflow Orchestration	Managed Apache Airflow service for ETL and data pipelines. Uses DAGs (Directed Acyclic Graphs) to define workflows.	Data pipelines, workflow automation
Dataflow	Data Processing	Serverless, fully managed streaming and batch data processing using Apache Beam.	ETL, real-time data analytics, event stream processing
Dataproc	Hadoop/Spark Clusters	Managed Hadoop, Spark, Hive, and Pig clusters. Easy to spin up/down clusters for temporary workloads.	Data lakes, Spark/MapReduce jobs, big data batch processing
Cloud Datalab	Data Science IDE	Interactive Jupyter notebook-based environment for data exploration, analysis, and ML model development.	Data exploration, visualization, ML prototyping
Dataprep	Data Cleaning	Serverless, visual tool for exploring, cleaning, and preparing data for analysis or ML. Auto-detects anomalies and outliers.	Data wrangling before feeding into BigQuery or ML models

12.2 Machine Learning Services

Service	Category	Description	Use Cases
AI Platform (Vertex AI)	ML Lifecycle Platform	Unified ML platform to train, deploy, and manage models. Supports TensorFlow, Scikit-learn, XGBoost, and more.	End-to-end ML model lifecycle
BigQuery ML	ML in BigQuery	Run ML models directly inside BigQuery using SQL syntax. No need to move data.	Predictive analytics, forecasting
AutoML	No-Code ML	Build custom ML models (vision, NLP, translation, tables) without needing deep ML knowledge.	Domain-specific custom models

12.3 Pre-trained APIs (for Common AI Tasks)

API	Category	Capabilities	Use Cases
Vision API	Image Analysis	Detect objects, faces, landmarks, and labels in images. OCR capabilities included.	Image moderation, product search
Video Intelligence API	Video Analysis	Detect objects, activities, and speech in videos. Supports video annotation and scene change detection.	Video content tagging, surveillance
Natural Language API	Text Analysis	Entity recognition, sentiment analysis, syntax analysis, and content classification.	Chatbots, document analysis
Translation API	Language Translation	Translate text between 100+ languages. Supports glossary for domain-specific terminology.	Multilingual apps, e-commerce
Speech-to-Text API	Speech Recognition	Converts spoken language into text in real-time. Supports multiple languages and noise robustness.	Voice commands, call center analytics
Text-to-Speech API	Speech Synthesis	Converts text into natural-sounding speech. Supports 100+ voices in 20+ languages.	IVR systems, virtual assistants
Dialogflow	Conversational AI	Build chatbots and voice bots with natural language understanding. Supports voice/text integration with Google Assistant.	Customer support bots, virtual agents

13. Operations Suite (Stackdriver)

13.1 Operations Suite Components

Tool	Category	Purpose	Key Features
Cloud Monitoring	Metrics & Dashboards	Visualize resource health, create dashboards, set alerting policies, and monitor metrics across cloud services and VMs.	Uptime checks, multi-project monitoring, custom alerts
Cloud Logging	Log Aggregation	Collects logs from GCP services, VMs, and on-prem systems. Allows log-based metrics and integrates with Monitoring. Logs are stored in a log bucket. Can be routed using a log sink to BQ/Pub-sub/Cloud Storage..	Log querying, export to BigQuery or Storage. Use Ops Agent to get logs from VMs.
Error Reporting	Error Aggregation	Real-time error detection and aggregation. Automatically groups similar errors and tracks frequency and impact.	Language support (Go, Java, Python, Node.js, etc.)
Debugger	Live Debugging	Debug production apps without stopping them. Set breakpoints and log points to inspect variables and stack traces.	Zero-downtime debugging, GitHub/GitLab integration
Trace	Performance Tracing	Analyze app latency and request traces. Helps identify bottlenecks in microservices or API requests.	Distributed tracing, end-to-end latency insights
Profiler	CPU & Memory Profiler	Continuously analyzes resource usage (CPU, memory) to optimize app performance.	Detect performance bottlenecks, low overhead profiling

13.2 Common Monitoring Use Cases

Use Case	Solution
Track CPU/memory usage of VMs	Cloud Monitoring + Ops Agent
Create alert on GKE pod crashes	Cloud Monitoring Alerts
Detect high error rates in app	Error Reporting + Cloud Logging
Identify slow API requests	Cloud Trace
Optimize app performance	Cloud Profiler
Aggregate logs across services	Cloud Logging with Log-Based Metrics
Trigger alerts based on logs	Log-based Metrics + Cloud Monitoring